Skip to content

fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU)#2310

Merged
nicoloboschi merged 4 commits into
mainfrom
fix/test-api-vchordrq-shard-contamination
Jun 19, 2026
Merged

fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU)#2310
nicoloboschi merged 4 commits into
mainfrom
fix/test-api-vchordrq-shard-contamination

Conversation

@nicoloboschi

@nicoloboschi nicoloboschi commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the recurring test-api shard failures (it had been red across re-runs and blocking every code PR's CI signal). Three independent cross-test contamination / isolation bugs were stacked in the test-api (2/3) shard, each surfacing as a deterministic-looking "infra flake". This PR fixes all three.

1. vchord config-cache leak (the dominant failure — dozens of tests)

test_link_utils.py's ANN tests monkeypatch HINDSIGHT_API_VECTOR_EXTENSION (one sets vchord). That value is read through the process-global config cache (get_config()), and monkeypatch reverts only the env var on teardown — not the cache. Once get_config() caches vchord, every later bank-creating test on that xdist worker builds per-bank indexes with USING vchordrq against the pgvector-only test DB and fails with:

asyncpg.exceptions.UndefinedObjectError: access method "vchordrq" does not exist

Fix: an autouse fixture on TestComputeSemanticLinksAnnPgBouncerSafety that clears the config cache before/after each test (same pattern as test_worker_retry_knobs.py).

2. Half-built tenant schemas in test_maintenance_multitenant

It provisioned 100 tenant schemas with autocommitted CREATE SCHEMA + per-table CREATE TABLE, leaving a window where a schema existed with only some tables.

Fix: wrap provisioning in a single transaction so schemas appear atomically.

3. TOCTOU in the maintenance routines (latent prod bug)

public.banks_needing_consolidation() / public.schemas_with_expired_rows() snapshot the schemas owning a target table from pg_class, then run a dynamic query against each. A schema dropped between snapshot and query (a tenant being deleted/migrated, or the multi-tenant test's teardown running concurrently with test_maintenance_routines) aborts the whole routine:

relation "<schema>.memory_units" does not exist
relation "<schema>.audit_log" does not exist

Fix: forward migration c7e9f1a3b5d2 redefines both routines (CREATE OR REPLACE, public/base-run gated, PG-only) so each per-schema query runs in its own subtransaction that skips the schema on undefined_table / invalid_schema_name / undefined_column. Adds a deterministic regression test (schema with memory_units but no banks table).

Verification

  • vchord repro flips 6 failed → 7 passed; test_link_utils.py 40 passed.
  • Maintenance pair (test_maintenance_routines + test_maintenance_multitenant) green; new resilience regression test passes.
  • Migration: single head, test_migration_shape.py green, applies cleanly (e1f2a3b4c5d6 → c7e9f1a3b5d2).
  • ruff/ty clean (tests excluded from ruff lint per repo config).

Note: test-doc-examples (cli) failures seen on earlier runs are an unrelated Gemini real-LLM timeout / 400 INVALID_ARGUMENT flake, not touched here.

… stop cross-test contamination

The ANN tests in test_link_utils.py monkeypatch
HINDSIGHT_API_VECTOR_EXTENSION (e.g. to "vchord"). That env var is read
through the process-global config cache (get_config()), and monkeypatch
reverts only the env var on teardown — not the cache. Once get_config()
caches "vchord", it persists for the rest of the xdist worker.

Every subsequent bank-creating test on that worker then builds per-bank
vector indexes with `USING vchordrq` against the pgvector-only test DB
and fails with:

    asyncpg.exceptions.UndefinedObjectError: access method "vchordrq" does not exist

cascading across dozens of unrelated tests in the test-api shard
(test_list_documents, test_maintenance_routines, test_mental_models,
test_observations, ...). Because the leak depends on which worker first
populates the cache, the failure looked like a flaky, shard-specific
infra problem.

Fix: add an autouse fixture to the class that clears the config cache
before and after each test, so the cache is rebuilt from the current
env per test and "vchord" can't leak out.
test_maintenance_multitenant provisions 100 tenant schemas by running
CREATE SCHEMA + 5×CREATE TABLE per schema. Each statement autocommitted,
so there was a window where a schema existed with only some of its
tables. The global maintenance routines (public.schemas_with_expired_rows
/ banks_needing_consolidation) discover schemas by table presence and are
exercised concurrently by test_maintenance_routines on another xdist
worker against the shared test DB. They would query a not-yet-created
table in a half-built schema and fail with:

    asyncpg.exceptions.UndefinedTableError: relation "mt<hash>_NNN.memory_units" does not exist

Wrap the whole provisioning in a single transaction so the schemas
become visible to other connections only once fully built.
…utines

public.banks_needing_consolidation() and public.schemas_with_expired_rows()
snapshot the schemas owning a target table from pg_class, then run a dynamic
query against each schema in turn. That is a TOCTOU race: a schema (or its
tables) can be dropped between the snapshot and the per-schema query — a tenant
being deleted, a tenant migration recreating tables, or (in the test suite) the
multi-tenant maintenance test creating/dropping ~100 schemas concurrently with
test_maintenance_routines on the shared DB. The query then aborts the whole
routine with:

    relation "<schema>.memory_units" does not exist
    relation "<schema>.audit_log" does not exist

Forward migration c7e9f1a3b5d2 redefines both routines (CREATE OR REPLACE,
public/base-run gated, PG-only) so each per-schema query runs in its own
subtransaction that skips the schema on undefined_table / invalid_schema_name /
undefined_column instead of failing the scan.

Adds a deterministic regression test (schema with memory_units but no banks
table) for the skip path.
@nicoloboschi nicoloboschi changed the title fix(tests): reset config cache after vchord tests to stop test-api shard contamination fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU) Jun 19, 2026
…op chunks-mode leak

test_memory_defense._make_minimal_engine() builds a MemoryEngine inside a
patch.dict that sets HINDSIGHT_API_LLM_PROVIDER=none. Constructing the engine
calls get_config(), repopulating the process-global config cache from the
patched env — and provider="none" forces retain_extraction_mode="chunks". When
patch.dict restores the env, the cache still holds the "none"/chunks config.

It then leaks to every later test on the same xdist worker: their retains run
in chunks mode (raw text, NO entity extraction), so unrelated assertions fail —
notably the test_observations entity tests ("John/Alice/Nexora entity should
exist"), which presented as a flaky, shard-specific failure (whichever entity
test landed on the poisoned worker).

Drop the config cache after the patched env is restored so the next get_config()
rebuilds from the real env. Reproduced deterministically:

    pytest test_memory_defense.py::test_engine_memory_defense_shares_ext_ctx \
           test_observations.py::test_entity_extraction_on_retain
    # before: entity test FAILED (Insert unit_entities: 0 pairs)
    # after:  passed
@nicoloboschi nicoloboschi merged commit af42382 into main Jun 19, 2026
98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant